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The  Applicability  of  Minimum-Relative-Entropy  Spectral  Estimation: 

An  Analysis  and  Critique 

1.  Introduction 

Minimum-relative-entropy  spectral  estimation  (MRESE)  [l]  is  based  on  a  set  of 
(approximate)  values  for  the  autocorrelation  function  (ACF)  calculated  from  a  finite 
sequence  of  observed  values  of  a  stationary  random  process,  the  power  spectral  density 
being  the  Fourier  transform  of  the  ACF  according  to  the  Wiener-Khinchin  theorem  [2]. 
MRESE  assumes  that  the  ACF  values  for  longer  lags  are  such  as  to  minimize  the  relative 
entropy  of  the  posterior  (“final”)  spectral  estimate  (i.e. ,  the  estimate  that  takes  the  fore¬ 
going  ACF  values  into  account)  with  respect  to  the  prior  (“initial”)  estimate.  This 
approach  to  supplying  the  additional  information  needed  for  determining  the  spectrum  is 
justified  by  a  set  of  very  appealing  axioms  [l]  concerning  the  effect  of  new  information  in 
determining  a  posterior  spectral  estimate  from  a  prior  spectral  estimate. 

Since  entropy  is  the  measure  of  uncertainty,  its  maximization  ensures  that  as  little 
as  possible  is  assumed  about  the  processes  beyond  what  is  known.  In  the  case  of  a 
sinusoidal  signal  in  gaussian  noise,  the  signal  is  generally  not  gaussian,  but  the  minimiza¬ 
tion  of  (6)  is  nonetheless  desirable  because  of  its  ease,  and,  if  the  amplitude  of  the  signal 
is  unknown,  the  latter  might  reasonably  be  regarded  as  a  sinusoidal  gaussian  random 
process.  Regardless  of  the  theoretical  justifications  for  the  use  of  MRESE,  however,  its 
value  lies  in  the  improved  spectral  resolution  that  it  offers  [5]  in  suitable  applications. 

The  relative  entropy  of  a  (posterior  or  final)  probability  density  function  q  (x)  with 
respect  to  a  (prior  or  initial)  probability  density  function  p  (x)  is 

H{q,p)=  fq{x)  log  d x, 

where  x  can  be  a  vector  whose  T /t  components  represent  the  values  of  a  signal  plus 
noise  at  instants  spaced  uniformly  by  rover  an  interval  of  length  T.  We  shall  take  all 
logarithms  to  the  base  e  .  Thus,  the  relative  entropy  (also  called  the  cross  entropy, 
directed  divergence,  discrimination  information,  and  Kullback-Leibler  number)  per  sam¬ 
ple  is  tH (q  ,p  )/  T  natural  units  (nits),  a  nit  being  1.4427  bits  (binary  units).  Since 
H (q  ,p  )  is  invariant  under  any  invertible  change  of  coordinates  [3),  it  will  have  the  same 
value  if,  instead  of  with  the  components  of  x,  we  deal  with  the  Fourier  coefficients  of  the 
sequence  of  values  or  with  the  powers  and  the  phase  angles  of  the  Fourier  components. 
Since  the  sampling  rate  is  1/r,  the  frequencies  of  these  Fourier  components  extend  from 
0  up  to  the  Nyquist  frequency  l/2r;  any  higher-frequency  components  of  the  waveform 
would  be  aliased  into  this  band  by  the  sampling  process. 
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For  a  gaussian  random  process,  which,  for  any  given  spectrum,  has  the  greatest 
entropy  rate,  these  phase  angles  are  independently  uniformly  distributed.  Their  proba¬ 
bility  densities  in  the  numerator  and  denominator  of  g(x)/p(x)  therefore  cancel  out  and 
can  be  ignored.  The  powers  y  of  the  Fourier  components  are  independently  exponen¬ 
tially  distributed  with  means  given  by  the  bandwidth  l/T  times  the  power  spectral  den¬ 
sity,  say  Q  (f  )  or  P(f  ),  respectively.  Substituting  the  probability  density  functions 

p(y)  =  ■  e-Ty  P(f  )  an(j  a  'y  \  __  J  e-Ty /Q(f  )  for  y  >  o  and  0  otherwise 

for  the  power  of  the  Fourier  component  of  frequency  /  into  log  9  y  /  anc|  averaging 

p(y ) 

over  the  posterior  distribution  q(y),  we  find  that  these  components  contribute 
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to  the  relative  entropy  of  the  Fourier  component  at  frequency  /  .  Note  that 
u  -  log  u  -  1  is  a  convex  function  of  u  for  u  >0  with  its  minimum  at  u  =  1.  In  that 
neighborhood  it  is  approximately  (u  -  I)2/2,  and  it  rises  to  oo  as  u  — *  0  and  as  u  — *  oc, 
thus  ensuring  that  Q[f  )  >  0. 


Approximating  the  sum  over  all  frequencies  (spaced  by  l/T)  by  an  integral,  we 
find  that  the  per-sample  relative-entropy  rate  is 
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which  is  r  times  the  Itakura-Saito  distortion  [4j.  The  factor  r  here  cancels  the  df 
dimensionally,  causing  this  quantity  to  be  measured  in  nits;  it  has  also  been  described  as 
the  normalized  Itakura-Saito  distortion.  Hence,  for  gaussian  random  processes,  minimi¬ 
zation  of  the  relative  entropy  means  minimization  of  the  Itakura-Saito  distortion 

Section  2  discusses  the  extension  of  MRESE  to  the  case  where  one  has  not  only 
values  of  the  ACF  of  the  sum  of  two  waveforms — signal  and  noise — but  also  prior  esti¬ 
mates  of  their  separate  spectra  as  well  as  weights  for  the  latter,  and  one  wants  posterior 
estimates  for  their  separate  spectra.  Section  3  studies  the  limit  as  the  weight  for  the 
prior  estimate  of  the  spectrum  of  a  sinusoidal  signal  of  unknown  frequency  is  allowed  to 
approach  zero  because  the  broad  prior  estimate,  reflecting  ignorance  as  to  its  frequency, 
is  obviously  a  very  poor  approximation  to  the  spectrum  of  a  sinusoid.  Section  4  deals 
with  the  resulting  posterior  estimate  of  the  signal’s  spectrum,  and  Sec.  5  with  computa¬ 
tion  of  estimates  of  the  signal’s  frequency  and  strength.  Section  6  discusses  the  incon¬ 
sistency  between  the  single-signal  MRESE  estimate  of  the  spectrum  of  the  sum  of  the 
two  waveforms  (signal  and  noise)  and  the  sum  of  their  separate  estimates,  and  Sec.  7 
determines  the  appropriate  scale  factors  to  be  used  when  the  prior  estimates  of  the  sig¬ 
nal  and  noise  spectra  are  subject  to  unknown  scaling,  as  in  the  case  where  there  may  be 
an  unknown  but  constant  attenuation  in  each  transmission  path.  Section  8  discusses 
various  other  problems  yet  to  be  resolved  in  regard  to  the  application  of  MRESE,  and 
finally  Sec.  9  summarizes  these  problems  and  the  progress  that  has  been  made  in  the 
preceding  sections. 


2.  Multiaignal  MRESE 


Multisignal  minimum-relative-entropy  spectral  estimation  ,6]  uses  A f  —  1  values 
R{rr)  (with  r  =  1,  2,  ,  A/)  of  the  autocorrelation  function  of  the  observed  signal- 
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plus-noise  process  along  with  prior  estimates  P,(f  )  and  P„(f  )  of  the  signal  and  noise 
spectra,  respectively,  and  [7]  positive  weights  w,  and  wn  (possibly  depending  on  the  fre¬ 
quency  /  )  for  these  prior  estimates — or,  more  precisely,  for  the  Itakura-Saito  distortions 
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that  they  suffer  when  replaced  by  Q,[f  )  and  Qn(f  ) — to  obtain  the  posterior  spectral 
estimates 
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where  the  {/3r  }  (Lagrange  multipliers  used  in  the  distortion  minimization)  arf  chosen  so 
that 

1/2  T 

/  [Q.  (/  )  +  Qn  {/  )]  cos  2 r  *f  rdf  =  R  (r  t)  (5) 


for  r  =  0,  1,  •  •  ,  M;  where  r  is  the  sampling  interval.  Because  the  second  variation  of 

the  weighted  Itakura-Saito  distortion 
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with  respect  to  Q,(f  ),  viz.,  tw,/Q,2{I  ),  is  positive  while  the  constraints  (5)  are  linear 
>n  <?,(/),  a  posterior  spectrum  of  the  form  (3)  obtained  by  setting  the  first  variation 
equal  to  zero  will  not  only  minimize  (1)  but  will  yield  the  unique  minimum.  Similarly,  a 
posterior  spectrum  of  the  form  (4)  uniquely  minimizes  (2). 

The  XI  -h  1  conditions  (5)  on  Q,  (/  )  4-  Qn  (/  )  imply  that 

QAf  )  +  Qn(f  )  =  2tR  (0)  +  £  R  (r  r)  cos  2r  irf  t 

r  =1 

with  R  (0),  ■  •  •  ,/?(Mr)  having  the  measured  values  and  R  ((M  +l)r),R  ((Xf  ^2)r),  ■  ■  ■ 
having  the  values  that  minimize  (6).  Thus,  the  known  values  of  the  autocorrelation 
function  determine  the  gross  form  of  the  posterior  total  spectrum  while  the  fine  structure 
is  implied  by  the  minimization  of  the  weighted  relative  entropy. 

3.  Small  Weight  for  Prior  Signal  Spectrum 


In  some  applications,  such  as  HF”radar,  the  signal  may  often  be  well  approximated 
by  a  sinusoid  of  unknown  strength  and  (Doppler)  frequency.  In  such  a  case,  the  limit  of 
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(3)  as  u;,  —  0  is  of  considerable  interest  because  it  has  been  observed  [7]  that,  when  ws  is 
small,  Q,  (/  )  tends  to  be  narrow  and  thus  provides  a  sharp  indication  of  the  signal  fre¬ 
quency  fa.  When  f,  is  unknown  and  P,(f  )  is  therefore  taken  to  be  flat  across  the 
band  of  possible  values,  it  should  be  appropriate  to  assign  only  a  small  weight  to  this 
clearly  inaccurate  broad  estimate  of  a  spectrum  that  is  known  to  be  narrow. 

We  cannot  simply  set  wa  =  0,  however,  as  there  would  then  be  no  constraint  on 
the  normalized  Itakura-Saito  distortion  (1)  and  so  Qa(f  )  would  no  longer  have  to 
remain  positive.  Another  reason  for  a  careful  approach  to  the  limit  is  that  the  forms  (3) 
and  (4)  along  with  the  constraints  (5)  do  not  uniquely  determine  the  Lagrange  multi¬ 
pliers  {@r  }  despite  the  convexity  of  (1)  and  (2)  and  the  consequent  uniqueness  of  the 
minimum  of  (6).  More  than  one  set  of  values  for  the  {/3r  }  in  (3)  and  (4)  will  satisfy  (5) 
because  of  the  nonlinearity  of  (3)  and  (4)  as  functions  of  the  {; 3r  }.  In  particular,  as 
wa  —*  0,  there  are  generally  two  types  of  solutions — one  with  values  for  the  {f3r  }  that 
likewise  approach  0  and  the  other  with  values  for  the  {/?r  }  that  do  not.  In  the  latter 
case,  Qa(f  )  generally  approaches  0  for  every  /  ,  and  the  constraints  (5)  are  satisfied  by 

(4)  alone.  This  solution,  however,  yields  a  positive  weighted  Itakura-Saito  distortion  (6) 
while  the  other,  with  @r  — *  0.  may  make  Qn(f)  =  P„(f)  and  may  then  satisfy  (5)  by 
giving  (3)  a  suitable  form,  thus  making  (6)  zero.  However,  there  will  be  no  solution  of 
the  latter  type  if  there  is  no  nonnegative  Qa(f  )  of  the  form  (3)  that,  when  added  to 
Q„  if  )  =  Pn{f  ),  satisfies  (5);  e.g.,  when  Pn  (/  )  is  too  large.] 

Neither  of  these  two  types  of  spectral  estimates  is  satisfactory,  as  one  yields  a  van¬ 
ishing  posterior  signal-spectrum  estimate,  and  the  other  fails  to  modify  the  noise- 
spectrum  estimate  on  the  basis  of  the  information  (5).  Accordingly,  before  letting 
w,  — »  0.  we  impose  the  constraint 

l/2r 

f  Qt{f  )  df  —  S  (7) 

o 

on  Qi(f  )  (with  the  help  of  the  Lagrange  multiplier  X)  in  addition  to  the  constraints  (5) 
so  that  the  total  estimated  signal  power  will  be  5  and  cannot  go  to  zero  with  ws  even 
though  we  restrict  our  attention  to  posterior  spectra  of  the  form  (3)  and  (4)  with  the 
{ 3r  }  not  all  going  to  zero  so  that  Qn(f  )  will  be  influenced  by  the  ACF  values  (5).  Being 
another  linear  constraint  like  (5),  (7)  ensures  that  the  minimization  of  the  weighted 
Itakura-Saito  distortion  (6)  still  yields  a  unique  set  of  posterior  spectral  estimates. 

The  effect  of  (7)  is  to  add  to  the  3q  in  (3)  but  not  to  that  in  (4)  a  constant  X  which 
is  to  be  given  the  value  satisfying  (7).  [In  the  case  of  a  flat  w,  (/  ),  a  flat  l/Pa(f  )  can 
be  absorbed  into  X.j  X  must  be  large  enough  to  ensure  that  the  denominator  of  (3)  never 
becomes  negative — but  only  barely  large  enough  when  wa  is  small,  for  otherwise  (3)  will 
approach  zero  at  all  frequencies  and  will  not  satisfy  (7).  When  w,  (f  )  is  small,  Qs[f  ) 
then  differs  significantly  from  zero  only  near  the  frequency,  say  /  0,  where  the  summa¬ 
tion  in  its  denominator  is  small,  i.e..  near  the  minimum  of  the  summation  in  (4).  Hence. 
Q,(f)  automatically  becomes  a  line  spectrum  with  power  S  at  frequency  /  0,  which  is 
thus  the  minimum-relative-entropy  estimate  of  the  signal’s  frequency.  As  wa(f  )  goes  to 
zero  at  all  frequencies,  its  effect  and  that  of  P,{f  )  on  Q,(J  )  vanish,  and  it  becomes 
unnecessary  to  choose  a  prior  estimate  or  weighting  function  for  the  signal  spectrum. 

Since,  with  (7)  included,  the  {3r}  do  not  approach  zero,  Qn(f  )  is  able  to  reflect 
the  information  contained  in  (-5).  Q,{f  )  too  is  able  to  reflect  this  information  because 
(3)  then  differs  from  zero  only  where  the  quantity  multiplying  1 ,  u\  in  its  denominator  is 
small  (on  account  of  the  inclusion  of  X).  Thus,  the  resulting  posterior  spectral  estimates 


have  the  desirable  aspects  of  both  the  solution  for  which  /3r  — ►  0  and  that  for  which  it 
does  not,  and  Q,(f  )  has  the  right  form  for  a  sinusoidal  signal. 

4.  Posterior  Signal  Spectrum 


When  w„  is  small  but  is  still  positive,  the  principal  denominator  of  (3)  has  a  global 
minimum  at,  say,  /  0  that  does  not  quite  reach  zero,  and  in  that  neighborhood  it  is 
approximately  a  quadratic  function  of  the  frequency  /.  Hence,  Qa(f  )  has  the  shape  of 
a  witch  of  Agnesi  (like  the  Cauchy  density  function  p(x)  =  jr_1/(l  4-  x2)j  centered  at 
/  o-  The  height  of  the  witch  is  proportional  to  S‘/wa,  and  its  width  is  proportional  to 
w,  / S  when  ws  is  small.  As  long  as  5  is  small,  the  apportionment  of  this  amount  of 
power  to  the  posterior  signal  spectrum  Qt{f  )  will  hardly  affect  the  posterior  noise  spec¬ 
trum  Qn  (/  ),  and  the  latter  can  be  determined  in  the  single-signal  manner  !lj  just  as  if 
no  signal  were  present,  i.e. ,  as  (4)  with  the  {/}r  }  determined  by  (5)  with  Qa(f  )  =  0. 

When  S  is  significant  in  comparison  with  the  total  noise  power,  its  effect  on  the 
{/3r  }  must  be  taken  into  account,  as  it  will  contribute  5  cos2r7r/0r  to  (5).  In  any 
case,  the  location  /  0  of  the  global  minimum  of 

u;n(/)l  q^T7T~  pHTt)  (8) 


[which  is  the  summation  in  (3)]  is  the  MRESE  estimate  of  the  frequency  f ,  of  the  sig¬ 
nal.  Without  the  condition  (7)  (and  without  X)  (8)  might  remain  positive,  making 
Q,(f  )  —  0  at  all  frequencies,  or  it  might  go  to  zero  at  some  frequency  /  0,  giving  the 
posterior  signal  power  at  that  frequency  the  indeterminate  value  0/0.  Thus,  (7)  resolves 
any  such  indeterminacy. 


5,  Estimation  of  Signal  Strength  and  Frequency 


It  remains  to  find  a  suitable  way  to  choose  S  when  the  signal  strength  is  unknown 
(and  may  be  0)  and  a  way  to  assign  an  accuracy  to  the  estimate  /  0  of  the  signal’s  fre¬ 
quency  f ,  as  well  as  to  provide  a  measure  of  the  reliability  of  the  determination  that  a 
signal  is  present  or  absent.  A  possible  way  to  estimate  the  signal  power  5  would  be  to 
determine  the  5  that  minimizes  the  weigRted  Itakura-Saito  distortion  (2).  Experimental 
testing  may  show  whether  this  approach  is  useful.  If  the  resulting  value  of  S  exceeds 
some  threshold,  it  can  be  deemed  to  indicate  the  presence  of  a  signal,  and  otherwise  its 
absence. 

The  second  derivative  of  (6)  with  respect  to  S  at  its  minimum  (with  ws  =  0)  may 
be  inversely  proportional  to  the  variance  of  this  estimate  of  S ,  since,  for  a  normal  distri¬ 
bution,  the  relative  entropy  resulting  from  a  shift  equals  half  the  ratio  of  the  square  of 
that  shift  to  the  variance.  More  generally,  the  second  derivative  at  zero  hift  of  the  rela¬ 
tive  entropy  of  a  distribution  with  respect  to  a  shifted  version,  which  is  the  Fisher  infor¬ 
mation  of  that  distribution  [8,  p.  1010] ,  cannot,  by  the  Cramer-Rao  inequality  [8,  p. 
943],  be  less  than  the  reciprocal  of  the  variance,  with  equality  only  in  the  normal  case, 
and  so  this  second  derivative  provides  a  lower  bound  for  the  variance. 

The  determination  of  the  signal-frequency  estimate  /  0  is  straightforward  when  5  is 
small,  but  an  iterative  procedure  appears  necessary  for  larger  5.  For  this  purpose  the 
/  o  for  small  S  can  be  used  to  subtract  5  cos  2r7r/0r  from  the  initial  values  of  R  (r  r). 
A  new  /0  can  then  be  obtained  from  the  minimum  of  the  resulting  function  (8),  and  the 
process  can  be  continued  for  an  increasing  sequence  of  values  of  5.  This  procedure  will 


yield  a  unique  estimate  of  the  signal’s  frequency  even  though  it  might  otherwise  not  be 
unique.  The  iterative  estimation  of  /,  as  the  total  signal  power  5  is  gradually  increased 
from  zero  may  well  require  less  computing  time  than  the  search  [91  for  the  spectra  (3) 
and  (4)  that  satisfy  (5),  as  it  has  the  advantage  of  not  needing  a  prior  signal  spectrum 
nor  relative  weights  for  prior  signal  and  noise  spectra.  It  remains  to  be  seen  how  well 
this  approach  works  and  in  what  sort  of  applications  it  performs  best.  (See  also  ;10|.) 

The  iterative  process  for  determining  the  estimate  /  0  of  the  signal  frequency  f  g 
and  the  estimate  Qn(f  )  of  the  noise  spectrum  begins  by  computing  the  initial  estimate 

f  )  from  Pn(f  )  and  the  observed  values  of  R  (rr)  via  the  Levinson  technique  uti¬ 
lized  in  the  second  algorithm  of  [9j  and  then  determining  the  initial  estimate  /  q ’’  as 
that  /  for  which  wn(f  )[1  :Qn{0\f  )  -  1  /Pn(f  )  i  is  least.  At  the  i  th  succeeding  stage  of 
the  iteration,  the  observed  values  of  R  (r  r)  are  reduced  by  5(‘*  cos  'Irnf  Qn'Xf  ) 

is  computed  from  them  and  Pn(f  ),  and  /  d' *  is  determined  as  the  /  that  minimizes 
wn(/  )[1  i'Qn'Xf  )  -  1  /Pn(f  )!•  The  iteration  is  continued  for  a  sequence  of  values  5(1 ' 
of  S  increasing  from  5*°'  =  0  until  5(l*  reaches  the  desired  total  signal  power  5,  with 
decreasing  increments  as  this  value  is  approached. 


6.  Consistency  of  Multiple-Signal  and  Single-Signal  Spectral  Estimates 


One  might  expect  the  sum  of  the  spectra  (3)  and  (4)  to  equal  the  estimate 

1 


<?(/)  = 


1 


P.(f)  +  Pn(f 


3r  1  cos  2  r  nf  r 

r  =0 


of  the  spectrum  of  the  total  observed  process  subject  to  the  same  constraints 

1/ 2r 

/  Q  (/  )  cos  2r  7r/  t  df  =  R  (r  t).  (10) 

o 

A  simple  example,  M  =1,  suffices  to  show,  however,  that,  in  the  case  of  flat  prior  spec¬ 
tral  estimates  and  constant  weights,  for  example,  (9)  is  in  general  not  the  sum  of  (3)  and 
(4).  When  (7)  is  very  small,  the  posterior  signal  spectrum  has  very  little  effect  on  the 
total  posterior  spectrum,  and  there  is  almost  no  difference  between  (9)  and  the  sum  of 
(3)  and  (4).  When  w„  is  small  but  5  is  not,  however,  Q,[f  )  includes  a  substantial  spec¬ 
tral  line  while  (9)  does  not,  and  so  there  is  again  a  significant  difference  between  the  sum 
of  the  multisignal  minimum-relative-entropy  estimates  and  the  single-signal  estimate. 
The  knowledge  that  a  spectral  line  may  be  present  fundamentally  alters  the  nature  of 
the  estimate. 

Another  approach  [11;  to  multiple-signal  relative-entropy  spectral  estimation,  how¬ 
ever,  begins  with  the  assumption  that  the  sum  of  the  posterior  spectra  and  cross-spectra 
is  the  MRESE  of  the  sum  of  the  signal  and  noise  based  on  the  prior  estimate  of  the  spec¬ 
trum  of  the  sum.  Musicus  and  Johnson  divide  the  difference  between  the  total  posterior 
and  total  prior  spectra  among  the  posterior  signal  and  noise  spectra  and  cross-spectra  by 
what  is,  in  effect.  Wiener  filtering.  The  resulting  posterior  signal  and  noise  spectra  there¬ 
fore  differ  from  (3)  and  (4),  and  cross-spectra  arise  even  in  the  absence  of  prior  cross- 
correlation.  This  approach  does  not  incorporate  weighting,  and  so  it  is  not  possible  to 
remove  the  influence  of  the  prior  signal-spectrum  estimate  upon  its  posterior  spectral 
estimates. 
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7.  Unknown  Scales  for  Prior  Spectra 


In  some  applications,  such  as  the  processing  of  noisy  speech,  the  signal  and  noise 
will  suffer  unknown  constant  power  gains  10  log10  7,  and  10  log10  7„  decibels.  Here  it 
may  be  appropriate  to  use  7 „Ps(f  )  and  -)„Pn{f  )  as  the  prior  spectral  estimates 
instead  of  P,(f  )  and  Pn(f  ).  Substituting  these  into  (1),  (2),  and  (6)  and  minimizing  (6) 
under  the  constraints  (5)  with  respect  to  Q,{f  )  and  Qn{f  ),  we  get  (3)  and  (4)  with 
these  substitutions  for  P,(f  )  and  Pn(f  ).  Setting  the  derivatives  of  (6)  with  respect  to 
7,  and  7„  equal  to  zero,  we  find 
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for  1  =  s  and  n  .  Hence, 
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for  1  =  s  and  n  .  Since  the  second  partial  derivative  of  (6)  with  respect  to  1  / 7,-  is 


1.2r 

nr  /  t BiU  )  df  >  0,  we  see  not  only  that  (11)  yields  a  minimum  for  (6)  but  also  that 
o 

this  minimum  is  unique. 


If  either  prior  spectral  density  P,(/  )  is  independent  of  frequency,  the  effect  of  (11) 
is  simply  to  replace  it  in  (3)  or  (4)  with  the  value  of  the  corresponding  spectral  estimate 
Q,(f  )  averaged  over  the  frequency  band  from  0  to  l/2r  with  the  weighting  «■’,(/  ).  In 
the  case  of  noise  alone  with  a  flat  wn(f  ),  this  average  value  of  Q„(f  )  is  just  2 tR  (0). 
Otherwise  it  is  necessary  to  search  for  the  values  of  7,  and  ~j„  that  are  consistent  with 
(12)  as  well  as  for  the  values  of  the  {/?r  }  that  cause  the  sum  of  (12)  for  i  =s  and  for 
1 —n  to  satisfy  (5). 

To  simplify  the  computation  one  might  instead  have  chosen  7,  and  7„  to  normal¬ 
ize  the  prior  spectral  estimates,  i.e.,  to  give  them  each  a  unit  total  power.  This  pro¬ 
cedure.  however,  would  increase  (6):  and  the  choice  of  a  unit  total  power  would  intro¬ 
duce  an  inadmissible  arbitrary  effect  on  the  result. 


8.  Problems  Remaining  to  be  Resolved 

8,1.  Inexactness  of  Autocorrelation-Function  Values 

Although  MRESE  appears  to  be  based  on  a  solid  logical  foundation  1  .  it  must  be 
recognized  that  ACF  values  computed  from  a  finite  record  will  not  exactly  equal  the  true 
values  of  the  autocorrelation  function.  Thus,  not  only  are  ACF  values  for  lags  longer 
than  the  available  record  unknown,  but  those  for  shorter  lags  (especially  for  only  slightly 
shorter  lags)  are  known  imprecisely,  contradicting  an  assumption  of  the  MRESE 
approach.  Efforts  "I2',43i  have  been  made  to  take  into  account  the  inexactitude  of  ACF 


values  by  imposing  an  upper  bound  on  a  quadratic  function  of  their  departures  from  the 
initial  estimates  and  then  determining  the  set  of  ACF  values  which,  subject  to  this  con¬ 
dition,  minimizes  the  relative  entropy  of  the  posterior  spectral  estimate  with  respect  to 
the  prior. 

8.2.  Logical  Assignment  of  Weights  to  Prior  Spectral  Estimates 

While  the  procedure  just  described  produces  a  unique  posterior  spectral  estimate  . 
it  supposes  that  the  corrections  of  the  approximate  ACF  values  should  combine  to 
minimize  the  relative  entropy.  It  is  equally  likely,  however,  that  they  will  combine  to 
maximize  it;  but  it  is  much  more  likely  that  they  will  move  the  ACF  vector  (the  set  of 
ACF  values)  in  a  nearly  orthogonal  direction  that  hardly  affects  the  relative  entropy  but 
may  have  significant  effects  upon  the  shape  of  the  posterior  spectral  estimate.  Thus,  a 
better  idea  of  the  implications  of  the  approximate  nature  of  the  ACF  values  should  be 
obtainable  by  exploring  the  ACF-vector  space  in  the  neighborhood  of  the  estimated 
point,  and  attaching  to  each  spectral  value  an  uncertainty  given  by  the  variety  of  result¬ 
ing  posterior  spectral  values. 

Such  a  procedure,  however,  would  involve  a  far  greater  amount  of  computation 
than  the  more  naive  MRESE  approaches,  but  it  could  provide  information  as  to  the 
accuracy  of  the  posterior  spectral  estimate  and  could  thus  yield  a  posterior  weighting 
function  that  might  enable  the  posterior  spectral  estimate  to  be  used  as  a  prior  estimate 
with  new  observations  in  the  same  way  as  the  previous  prior  estimate  7,.  It  may,  on  the 
other  hand,  be  necessary  to  use  each  of  a  large  number  of  previous  posterior  spectra  as 
the  new  prior  in  order  to  obtain  a  reliable  picture  of  the  new  posterior  spectra  implied 
by  them. 

Apart  from  this  approach  (which  ignores  correlations  between  errors  in  the  values 
of  the  ACF  for  different  lags)  and  that  of  Secs.  3  and  4  with  w,  —  0,  a  logical  basis  is 
needed  for  assigning  weights  to  the  prior  spectral  estimates  for  the  signal  and  the  noise 
just  as  the  reciprocals  of  the  variances  of  the  residuals  serve  as  weights  for  overdeter¬ 
mined  linear  equations.  It  remains  to  be  seen  whether  weighting  inversely  proportional 
to  the  variance  of  the  prior  estimate  at  each  frequency  (but  still  ignoring  covariances) 
might  yield  satisfactory  posterior  spectral  estimates. 

8.3.  Criterion  for  Admissibility  for  Prior  Spectral  Estimates 

As  formulated,  MRESE  has  imposed  no  requirements  concerning  the  accuracy  or 
reliability  of  the  prior  spectral  estimates  that  it  uses,  and  so  there  is  as  yet  no  logical 
reason  to  prefer  one  prior  estimate  (such  as  the  maximum-entropy  Hat  estimate)  over 
another.  To  avoid  this  arbitrariness  and  to  ensure  the  usefulness  of  the  resulting  poste¬ 
rior  estimates,  it  is  necessary  that  the  prior  estimates  have  adequate  credentials.  Suit¬ 
ably  assigned  and  utilized  weights  might  provide  a  basis  for  the  admissibility  of  prior 
estimates.  In  fact,  it  might,  turn  out  that  the  classical  spectral  estimate  14  based  on 
the  same  ACF  data  could  serve  well  as  a  prior  for  obtaining  a  much  sharper  posterior 
minimum-reiative-entropy  estimate. 

8.4.  The  Relative  Nature  of  Weighting 

By  introducing  the  weighting  functions  «’»(/  )  and  wn(f  )  for  the  Itakura-C-aito 
distortions  (1)  and  (2),  we  are  able  to  place  greater  or  lesser  reliance  on  P,[f  '  or  Pn  I  /  ) 
at  each  frequency.  These  weights  are  only  relative,  however,  as  multiplication  of  both 
by  the  same  constant  will  leave  the  posterior  spectral  estimates  unchanged:  there  is  as 
yet  no  way  to  introduce  absolute  measures  of  the  credibilities  of  the  prior  estimates.  An 
absolute  standard  with  respect  to  which  weights  might  somehow  be  expressed  is  the  flat 
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prior  estimate  that  underlies  maximum-entropy  spectral  estimation.  It  might  be 
appropriate  for  the  posterior  spectral  estimate  based  on  such  a  prior  to  be  independent 
of  any  weighting,  as  the  constant  spectral  density  already  represents  maximum 
entropy — maximum  uncertainty  as  to  the  spectrum.  Although  (3)  and  (4)  depend  on 
w,(f  )  and  ut„(/  )  even  in  this  case,  a  new  approach  to  weighting  might  remove  that 
dependence. 

8.5.  Other  Methods  of  Spectral  Estimation 

Many  other  forms  of  spectral  estimation  besides  MRESE  and  classical  methods  [14' 
have  been  devised  [I5j- [19] — each  with  its  own  devotees,  who  find  it  well  suited  to  par¬ 
ticular  applications.  When  a  spectrum  can  be  assumed  to  be  described  by  a  finite 
number  of  unknown  parameters,  its  estimation  can  be  treated  as  a  parameter-estimation 
problem.  MRESE  performs  just  such  a  parameter  fitting  and  does  it  very  well  for 
autoregressive  spectra  of  the  form  (3)  and  (4),  to  which  it  leads,  especially  when  the 
order  M  of  the  autoregressive  process  is  known.  A  new  approach  [20]  to  maximum- 
entropy  spectral  analysis  offers  confidence  intervals  that  may  help  to  resolve  some  of  the 
weighting  problems  mentioned  above.  For  estimating  other  kinds  of  spectra,  on  the 
other  hand,  another  method  may  turn  out  to  be  superior  to  MRESE.  It  needs  to  be 
determined  experimentally  why  and  in  what  cases  MRESE  does  well. 

8.8.  Iterative  Use  of  MRESE 

In  OTH-radar  detection  MRESE  has  been  found  misleading  at  low  signal-to-clutter 
ratios  if  the  posterior  estimates  of  the  signal  and  clutter  spectra  (3)  and  (4)  are  used  as 
new  priors  [21],  since  a  weak  signal  can  then  show  up  more  and  more  in  the  posterior 
clutter  spectrum  rather  than  in  the  posterior  signal  spectrum,  thus  escaping  detection. 
To  avoid  this  outcome,  it  seems  preferable  not  to  iterate  but,  instead,  to  use  the  average 
of  all  of  the  autocorrelation  values  for  each  lag— or  to  average  all  of  the  posterior  spectra 
obtained  without  iterating.  Experience  should  show  when  such  an  approach  is  needed 
and  when  iteration  is  satisfactory. 

8.7.  Limitation  on  the  Order  M  of  the  Autoregressive  Spectrum 

In  applications  it  has  been  found  necessary  to  choose  the  right  order  M  for  the 
autoregressive  filter  whose  frequency  response  represents  the  minimum-relative-entropy 
or  the  maximum-entropy  estimate  of  the  spectrum  of  interest,  as  too  large  an  M  yields 
too  many  spectral  peaks  and  valleys — features  that  exhibit  no  stability  with  repeated 
analysis  or  increase  in  M .  MRESE  therefore  makes  use  of  only  a  limited  number  of 
values  R  (0),  •  •  •  ,R  (M r)  of  the  ACF.  The  information  contained  in  the  estimates  of 
the  ACF  for  longer  lags  is  ignored,  and  it  seems  desirable  to  find  a  way,  within  the 
MRESE  framework,  to  utilize  this  additional  information. 

The  spectrum  that  fits  these  M  4-  1  ACF  values  will  have  autocorrelation-function 
values  for  longer  lags  that  do  not  quite  match  the  available  ACF  estimates 
i?  ((.W  +l)r),  R  ((M  +2)r),  •  ■  of  longer  lags,  and  it  may  therefore  be  useful  to  adjust 
fi0,  ■  ■  ■  ,0U  to  reduce  the  mismatch  while  introducing  not  too  large  discrepancies  for 
R{0),  ■  ■  ■  ,R  (M  r).  The  result,  however,  may  no  longer  be  describable  as  MRESE  but. 
rather,  as  curve  fitting  unless  an  information-theoretical  basis  can  be  devised  for  the 
foregoing  parameter  adjustment.  An  alternative  approach  would  be  to  use  an  average  of 
posterior  spectra  based  on  R(kr),  R  ((k  +l)r),  •  •  •  ,  R  ({k  +M)r)  for  k  —  0,  1.  • 
Again  the  value  of  each  approach  will  have  to  be  determined  by  its  success  in  applica¬ 
tions. 
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8.8.  Disregard  of  Available  Information 

MRESE  has  been  described  as  making  full  use  of  all  available  ACF  information, 
unlike  classical  spectral  analysis,  which  windows  the  ACF  to  make  it  fall  smoothly  to 
zero  at  lags  exceeding  the  length  T  of  the  available  record,  and  as,  in  effect,  determining 
values  for  the  ACF  of  longer  lags  in  a  manner  that  minimizes  the  amount  of  information 
they  imply  concerning  the  spectrum.  We  see,  however,  that  MRESE  throws  away  even 
more  information  than  classical  spectral  analysis,  viz.,  R  (r  r)  for  r  >  XI,  where  Mr  is 
generally  small  compared  with  T ,  and  it  instead  estimates  these  ACF  values.  It  does 
make  very  good  use  of  R  (0),  •  ■  ■  ,  R  (XI  r)  and  is  thus  able  to  provide  better  spectra 
than  classical  methods  in  suitable  applications.  Some  of  the  foregoing  approaches  may. 
on  the  other  hand,  be  able  to  do  even  better  by  making  use  of  the  ACF  values  that  are 
redundant  for  autoregressive  processes  of  order  XI  but  otherwise  contain  useful  informa¬ 
tion. 

8.9.  Oversampling 

The  problem  of  having  too  many  ACF  values  (if  it  is  indeed  a  problem)  can  be 
exacerbated  by  reducing  the  interval  r  between  the  samples  of  the  record  of  a  waveform 
of  duration  T  While  the  most  convenient  sampling  interval  is  Nyquist’s  when  the 
waveform  can  be  assumed  to  be  bandlimited,  the  spectral  density  may  fall  only  slowly 
toward  zero  with  increasing  frequency,  thus  raising  a  question  as  to  what  should  be  con¬ 
sidered  its  cut-off  frequency  and  what  should  accordingly  be  the  sampling  interval  r.  A 
shorter  sampling  interval  ought  to  yield  at  least  as  much  information  about  the 
waveform  and  its  spectrum  as  a  longer  one,  but  some  modification  of  MRESE  may  be 
necessary  in  order  to  take  advantage  of  such  an  increase  in  the  amount  of  information 
available  without  thereby  introducing  false  and  unstable  spectral  peaks  and  valleys. 

9.  Summary 

Despite  its  success  in  some  applications,  we  thus  see  that  MRESE  might  benefit 
from  the  investigation  of  means  for  taking  care  of: 

•  The  approximate  nature  of  the  available  values  of  the  autocorrelation  function, 

•  Use  of  the  information  contained  in  the  ACF  of  lags  beyond  those  now  utilized, 

•  The  superabundance  of  information  due  to  a  shorter  sampling  interval  r, 

•  Posterior  spectra  of  the  form  (3)  and  (-4)  satisfying  (5)  but  not  minimizing  (6). 

•  Nongaussian  waveforms, 

•  Specification  of  credentials  or  qualifications  to  be  required  of  prior  estimates. 

•  Absolute  rather  than  the  present  relative  weighting  of  prior  spectral  estimates, 

•  Logical  assignment  of  weights  to  prior  spectral  “stimates,  and 

•  Determination  of  weights  for  the  posterior  spectra  when  used  as  priors. 
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The  first  five  problems  listed  above  are  common  to  both  MRESE  and  maximum- 
entropy  spectral  estimation.  Hence,  any  solution  of  them  would  benefit  both.  Some  pos¬ 
sible  approaches  to  the  solution  of  several  of  the  foregoing  problems  have  been  presented 
in  the  preceding  sections.  In  Secs.  3  and  4  the  problem  of  assigning  a  prior  spectral  esti¬ 
mate  for  a  sinusoidal  signal  has  been  solved  by  investigation  of  the  limit  as  the  weight 
given  to  that  estimate  is  allowed  to  approach  zero.  It  is  then  unnecessary  to  introduce 
either  a  prior  spectral  estimate  for  the  signal  or  a  weight  for  it.  In  Sec.  7  we  determined 
the  best  choice  of  gain  factors  for  prior  spectral  estimates  when  the  latter  are  subject  to 
unknown  scaling.  All  of  these  ideas  need  to  be  tested  experimentally  in  a  wide  variety  of 
applications  along  with  the  standard  form  of  MRESE  and  other  methods  [14}-[19|  of 
spectral  analysis  to  compare  them  as  to  resolution,  reliability,  and  computational 
requirements. 
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